Stacking for Statistical Machine Translation
نویسندگان
چکیده
We propose the use of stacking, an ensemble learning technique, to the statistical machine translation (SMT) models. A diverse ensemble of weak learners is created using the same SMT engine (a hierarchical phrase-based system) by manipulating the training data and a strong model is created by combining the weak models on-the-fly. Experimental results on two language pairs and three different sizes of training data show significant improvements of up to 4 BLEU points over a conventionally trained SMT model.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملMeta-level Statistical Machine Translation
We propose a simple and effective method to build a meta-level Statistical Machine Translation (SMT), called meta-SMT, for system combination. Our approach is based on the framework of Stacked Generalization, also known as Stacking, which is an ensemble learning algorithm, widely used in machine learning tasks. First, a collection of base-level SMTs is generated for obtaining a meta-level corpu...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملLeveraging Diverse Sources in Statistical Machine Translation
Statistical machine translation is often faced with the problem of having insufficient training data for many language pairs. In this thesis, several methods have been proposed to leverage other sources to enhance the quality of machine translation systems. Particularly, we propose approaches suitable in these four scenarios: 1. when an additional parallel corpus between the source and the targ...
متن کاملRefining Word Alignment with Discriminative Training
The quality of statistical machine translation systems depends on the quality of the word alignments that are computed during the translation model training phase. IBM alignment models, as implemented in the GIZA++ toolkit, constitute the de facto standard for performing these computations. The resulting alignments and translation models are however very noisy, and several authors have tried to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013